Goto

Collaborating Authors

 subsampled randomized hadamard transform


Optimal Iterative Sketching Methods with the Subsampled Randomized Hadamard Transform

Neural Information Processing Systems

Random projections or sketching are widely used in many algorithmic and learning contexts. Here we study the performance of iterative Hessian sketch for least-squares problems. By leveraging and extending recent results from random matrix theory on the limiting spectrum of matrices randomly projected with the subsampled randomized Hadamard transform, and truncated Haar matrices, we can study and compare the resulting algorithms to a level of precision that has not been possible before. Our technical contributions include a novel formula for the second moment of the inverse of projected matrices. We also find simple closed-form expressions for asymptotically optimal step-sizes and convergence rates. These show that the convergence rate for Haar and randomized Hadamard matrices are identical, and asymptotically improve upon Gaussian random projections. These techniques may be applied to other algorithms that employ randomized dimension reduction.



Faster Ridge Regression via the Subsampled Randomized Hadamard Transform

Neural Information Processing Systems

We propose a fast algorithm for ridge regression when the number of features is much larger than the number of observations ($p \gg n$). The standard way to solve ridge regression in this setting works in the dual space and gives a running time of $O(n^2p)$. Our algorithm (SRHT-DRR) runs in time $O(np\log(n))$ and works by preconditioning the design matrix by a Randomized Walsh-Hadamard Transform with a subsequent subsampling of features. We provide risk bounds for our SRHT-DRR algorithm in the fixed design setting and show experimental results on synthetic and real datasets.


Review for NeurIPS paper: Optimal Iterative Sketching Methods with the Subsampled Randomized Hadamard Transform

Neural Information Processing Systems

Additional Feedback: Please find below a list of questions and comments: - 1) Did you experiment applying the truncated Walsh-Hadamard transform (Ailon & Liberty. In Discrete & Computational Geometry, 2009.) when using SRHT? - 2) lines 199-207 Could you comment on the fact that m is constrained to be greater than d? Would it be possible to achieve better performances with smaller m? Is there any theory about this? - 3) Could you please precisely give the references claiming that m \approx d \log(d) is prescribed for state-of-the-art algorithms and for which algorithm? - 4) Below Theorem 4.1, there is an explanation on why this additional assumption \mathbb{E} \[ \Delta_0 \Delta_0 \top \] (1/d) I_d is a "mild assumption". I did not understood the provided argument. Is this assumption often/easily met? - 5) lines 222-223 and 264-265 it is mentioned that "SRHT [...] contains less randomness, but is more structured and faster to generate" than Haar matrix.


New Subsampling Algorithms for Fast Least Squares Regression Yichao Lu2 Dean Foster

Neural Information Processing Systems

We address the problem of fast estimation of ordinary least squares (OLS) from large amounts of data (n p). We propose three methods which solve the big data problem by subsampling the covariance matrix using either a single or two stage estimation.


Faster Ridge Regression via the Subsampled Randomized Hadamard Transform

Neural Information Processing Systems

We propose a fast algorithm for ridge regression when the number of features is much larger than the number of observations ($p \gg n$). The standard way to solve ridge regression in this setting works in the dual space and gives a running time of $O(n 2p)$. Our algorithm (SRHT-DRR) runs in time $O(np\log(n))$ and works by preconditioning the design matrix by a Randomized Walsh-Hadamard Transform with a subsequent subsampling of features. We provide risk bounds for our SRHT-DRR algorithm in the fixed design setting and show experimental results on synthetic and real datasets. Papers published at the Neural Information Processing Systems Conference.


Improved Subsampled Randomized Hadamard Transform for Linear SVM

arXiv.org Machine Learning

Subsampled Randomized Hadamard Transform (SRHT), a popular random projection method that can efficiently project a $d$-dimensional data into $r$-dimensional space ($r \ll d$) in $O(dlog(d))$ time, has been widely used to address the challenge of high-dimensionality in machine learning. SRHT works by rotating the input data matrix $\mathbf{X} \in \mathbb{R}^{n \times d}$ by Randomized Walsh-Hadamard Transform followed with a subsequent uniform column sampling on the rotated matrix. Despite the advantages of SRHT, one limitation of SRHT is that it generates the new low-dimensional embedding without considering any specific properties of a given dataset. Therefore, this data-independent random projection method may result in inferior and unstable performance when used for a particular machine learning task, e.g., classification. To overcome this limitation, we analyze the effect of using SRHT for random projection in the context of linear SVM classification. Based on our analysis, we propose importance sampling and deterministic top-$r$ sampling to produce effective low-dimensional embedding instead of uniform sampling SRHT. In addition, we also proposed a new supervised non-uniform sampling method. Our experimental results have demonstrated that our proposed methods can achieve higher classification accuracies than SRHT and other random projection methods on six real-life datasets.


Sketch-based Randomized Algorithms for Dynamic Graph Regression

arXiv.org Machine Learning

A well-known problem in data science and machine learning is {\em linear regression}, which is recently extended to dynamic graphs. Existing exact algorithms for updating the solution of dynamic graph regression problem require at least a linear time (in terms of $n$: the size of the graph). However, this time complexity might be intractable in practice. In the current paper, we utilize {\em subsampled randomized Hadamard transform} and \textsf{CountSketch} to propose the first randomized algorithms. Suppose that we are given an $n\times m$ matrix embedding $M$ of the graph, where $m \ll n$. Let $r$ be the number of samples required for a guaranteed approximation error, which is a sublinear function of $n$. Our first algorithm reduces time complexity of pre-processing to $O(n(m + 1) + 2n(m + 1) \log_2(r + 1) + rm^2)$. Then after an edge insertion or an edge deletion, it updates the approximate solution in $O(rm)$ time. Our second algorithm reduces time complexity of pre-processing to $O \left( nnz(M) + m^3 \epsilon^{-2} \log^7(m/\epsilon) \right)$, where $nnz(M)$ is the number of nonzero elements of $M$. Then after an edge insertion or an edge deletion or a node insertion or a node deletion, it updates the approximate solution in $O(qm)$ time, with $q=O\left(\frac{m^2}{\epsilon^2} \log^6(m/\epsilon) \right)$. Finally, we show that under some assumptions, if $\ln n < \epsilon^{-1}$ our first algorithm outperforms our second algorithm and if $\ln n \geq \epsilon^{-1}$ our second algorithm outperforms our first algorithm.


Scalable Algorithms for Learning High-Dimensional Linear Mixed Models

arXiv.org Machine Learning

Linear mixed models (LMMs) are used extensively to model dependecies of observations in linear regression and are used extensively in many application areas. Parameter estimation for LMMs can be computationally prohibitive on big data. State-of-the-art learning algorithms require computational complexity which depends at least linearly on the dimension p of the covariates, and often use heuristics that do not offer theoretical guarantees. We present scalable algorithms for learning high-dimensional LMMs with sublinear computational complexity dependence on p. Key to our approach are novel dual estimators which use only kernel functions of the data, and fast computational techniques based on the subsampled randomized Hadamard transform. We provide theoretical guarantees for our learning algorithms, demonstrating the robustness of parameter estimation. Finally, we complement the theory with experiments on large synthetic and real data.


New Subsampling Algorithms for Fast Least Squares Regression

Neural Information Processing Systems

We address the problem of fast estimation of ordinary least squares (OLS) from large amounts of data ($n \gg p$). We propose three methods which solve the big data problem by subsampling the covariance matrix using either a single or two stage estimation. All three run in the order of size of input i.e. O($np$) and our best method, {\it Uluru}, gives an error bound of $O(\sqrt{p/n})$ which is independent of the amount of subsampling as long as it is above a threshold. We provide theoretical bounds for our algorithms in the fixed design (with Randomized Hadamard preconditioning) as well as sub-Gaussian random design setting. We also compare the performance of our methods on synthetic and real-world datasets and show that if observations are i.i.d., sub-Gaussian then one can directly subsample without the expensive Randomized Hadamard preconditioning without loss of accuracy.